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Abstract 

f^ . We consider the problem of maximizing a nonnegative (possibly non-monotone) submodular set func- 

fSJ ' tion with or without constraints. Feige et al. [9] showed a 2/5-approximation for the unconstrained 

problem and also proved that no approximation better than 1/2 is possible in the value oracle model. 
^ ■ Constant-factor approximation was also given for submodular maximization subject to a matroid inde- 

pendence constraint (a factor of 0.309 [34) 1 and for submodular maximization subject to a matroid base 
0^ . constraint, provided that the fractional base packing number is at least 2 (a 1/4-approximation [34]). 

In this paper, we propose a new algorithm for submodular maximization which is based on the 

idea of simulated annealing. We prove that this algorithm achieves improved approximation for two 

problems: a 0.41-approximation for unconstrained submodular maximization, and a 0.325-approximation 

for submodular maximization subject to a matroid independence constraint. 

C/3 , On the hardness side, we show that in the value oracle model it is impossible to achieve a 0.478- 

O . approximation for submodular maximization subject to a matroid independence constraint, or a 0.394- 

approximation subject to a matroid base constraint in matroids with two disjoint bases. Even for 

the special case of cardinality constraint, we prove it is impossible to achieve a 0.491-approximation. 

^ ' (Previously it was conceivable that a 1/2-approximation exists for these problems.) It is still an open 

^SJ ' question whether a 1/2-approximation is possible for unconstrained submodular maximization. 
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1 Introduction 

A function / : 2^^ ^ M is called submodular if for any S,T C X, f{S U T) + f{S CiT) < f{S) + f{T). In 
this paper, we consider the problem of maximizing a nonnegative subm,odular function. This means, given 
a submodular function / : 2^^ — ?► R+, find a set S C X (possibly under some constraints) maximizing f{S). 
We assume a value oracle access to the submodular function; i.e., for a given set S, the algorithm can query 
an oracle to find its value f{S). 

Background. Submodular functions have been studied for a long time in the context of combinatorial 
optimization. Lovasz in his seminal paper [2 6) discussed various properties of submodular functions and 
noted that they exhibit certain properties reminiscent of convex functions - namely the fact that a naturally 
defined extension of a submodular function to a continuous function (the "Lovasz extension") is convex. 
This point of view explains why submodular functions can be minimized efficiently ^7\ [HJ [29] . 

On the other hand, submodular functions also exhibit properties closer to concavity, for example a 
function f{S) = 0(15*1) is submodular if and only if (j) is concave. However, the problem of maximizing a 
submodular function captures problems such as Max Cut [14] and Max fc-cover [7] which are NP-hard. Hence, 
we cannot expect to maximize a submodular function exactly; still, the structure of a submodular functions 
(in particular, the "concave aspect" of submodularity) makes it possible to achieve non-trivial results for 
maximization problems. Instead of the Lovasz extension, the construct which turns out to be useful for 
maximization problems is the m,ultilinear extension, introduced in [3]. This extension has been used to 
design an optimal (1 — l/e)-approximation for the problem of maximizing a monotone submodular function 
subject to a matroid independence constraint |33[ [5], improving the greedy 1/2-approximation of Fisher, 
Nemhauser and Wolsey [10]. In contrast to the Lovasz extension, the multilinear extension captures the 
concave as well as convex aspects of submodularity. A number of improved results followed for maximizing 
monotone submodular functions subject to various constraints [331 1131 [Ml E] • 

This paper is concerned with submodular functions which are not necessarily monotone. We only assume 
that the function is nonnegativelj The problem of maximizing a nonnegative submodular function has been 
studied in the operations research community, with many heuristic solutions proposed: data-correcting search 
methods [HI [TBI E] , accelatered greedy algorithms [28] , and polyhedral algorithms [25] . The first algorithms 
with provable performace guarantees for this problem were given by Feige, Mirrokni and Vondrak ,9,. They 
presented several algorithms achieving constant-factor approximation, the best approximation factor being 
2/5 (by a randomized local search algorithm). They also proved that a better than 1/2 approximation for 
submodular maximization would require exponentially many queries in the value oracle model. This is true 
even for symmetric submodular functions, in which case a 1/2-approximation is easy to achieve [9]. 

Recently, approximation algorithms have been designed for nonnegative submodular maximization sub- 
ject to various constraints [231 [211 [311 [H] • (Submodular minimization subject to additional constraints has 
been also studied [31] [131 (ISI-) The results most relevant to this work are that a nonnegative submodular 
functions can be maximized subject to a matroid independence constraint within a factor of 0.309, while a 
better than 1/2-approximation is impossible [34], and there is ^(1 — ^ — o(l))-approximation subject to a 
matroid base constraint for matroids of fractional base packing number at least v € [1, 2], while a better than 
(1 — i)-approximation in this setting is impossible |34) . For explicitly represented instances of unconstrained 
submodular maximization, Austrin [1] recently proved that assuming the Unique Games Conjecture, the 
problem is NP-hard to approximate within a factor of 0.695. 

Our results. In this paper, we propose a new algorithm for submodular maximization, using the concept 
of simulated annealing. The main idea is to perform a local search under a certain amount of random noise 
which gradually decreases to zero. This helps avoid bad local optima at the beginning, and provides gradually 
more and more refined local search towards the end. Algorithms of this type have been widely employed for 
difficult optimization problems, but notoriously difficult to analyze. 



'^For submodular functions without any restrictions, verifying whether the maximum of the function is greater than zero or 
not requires exponentially many queries. Thus, no approximation algorithm can be found for this problem. 



We prove that the simulated anneaUng algorithm achieves a 0.41-approximation for the maximization 
of any nonnegative submodular function without constraints, improving upon the previously known 0.4- 
approximation ^9^. (Although our initial hope was that this algorithm might achieve a 1/2-approximation, 
we found an example where it achieves only a factor of 17/35 ~ 0.486; see Appendix [Cl) We also prove 
that a similar algorithm achieves a 0.325-approximation for the maximization of a nonnegative submodular 
function subject to a matroid independence constraint (improving the previously known factor of 0.309 ^34j). 

On the hardness side, we show the following results in the value oracle model: For submodular maximiza- 
tion under a matroid base constraint, it is impossible to achieve a 0.394-approximation even in the special 
case when the matroid contains two disjoint bases. For maximizing a nonnegative submodular function 
subject to a matroid independence constraint, we prove it is impossible to achieve a 0.478-approximation. 
For the special case of a cardinality constraint (max{/(S') : \S\ < k} or max{/(S') : \S\ = fc}), we prove a 
hardness threshold of 0.491. We remark that only a hardness of (1/2 + e)-approximation was known for all 
these problems prior to this work. For matroids of fractional base packing number v = k/{k — 1), fc G Z, we 
show that submodular maximization subject to a matroid base constraint does not admit a (1 — e"'^^'' + e)- 



approximation for any e > 0, improving the previously known threshold of 1/k 
on the notion of a symmetry gap and the hardness construction of |34j . 



These results rely 



Problem 


Prior approximation 


New approximation 


New hardness 


Prior hardness 


max{/(S') : S CX} 


0.4 


0.41 


- 


0.5 


max{/(5) : \S\ < k} 


0.309 


0.325 


0.491 


0.5 


max{/(5) : \S\ = k} 


0.25 


- 


0.491 


0.5 


max{/(5) : 5 e 1} 


0.309 


0.325 


0.478 


0.5 


max{/(5) -.SeB}* 


0.25 


- 


0.394 


0.5 



Figure 1: Summary of results: f{S) is nonnegative submodular, I denotes independent sets in a matroid, 
and B bases in a matroid. * - in this line we assume the case where the matroid contains two disjoint bases. 
The hardness results hold in the value oracle model. 

The rest of the paper is organized as follows. In Section[2l we discuss the notions of multilinear relaxation 
and simulated annealing, which form the basis of our algorithms. In Section [Sj we describe and analyze 
our 0.41-approximation for unconstrained submodular maximization. In Section |4l we describe our 0.325- 
approximation for submodular maximization subject to a matroid independence constraint. In Section [5j 
we present our hardness results. Many details are deferred to the appendix. 

2 Preliminaries 



Our algorithm combines the following two concepts. The first one is m,ultilinear relaxation, which has recently 
proved to be very useful for optimization problems involving submodular functions (see [4j [33 l IS} [22 l [23 l [34 ] ) . 
The second concept is simulated annealing, which has been used successfully by practitioners dealing with 
difficult optimization problems. Simulated annealing provides good results in many practical scenarios, but 
typically eludes rigorous analysis (with several exceptions in the literature: see e.g. [2j for general convergence 
results, ^27j i20; for applications to volumes estimation and optimization over convex bodies, and |32j [3] for 
applications to counting problems). 

Multilinear relaxation. Consider a submodular function f : 2^ -^ M+. We define a continuous function 
F : [0, 1]'''" — )■ M+ as follows: For x e [0, 1]"^, let i? C X be a random set which contains each element i 
independently with probability Xi. Then we define 



F(x):=E[/(i?)]= E/(^)n^^n(i 



sex 



ies j^s 



This is the unique multilinear polynomial in xi, . . . , a;„ which coincides with f{S) on the points x G {0, 1}'''" 
(we identify such points with subsets 5 C X in a natural way) . Instead of the discrete optimization problem 



max{/(iS') : S £ T} where J^ C 2"^ is the family of feasible sets, we consider a continuous optimization 
problem max{i^(a:) : x G P{-F)} where P{J^) — conv({ls : S £ J^}) is the polytope associated with J". It is 
known due to [UIS1I35 that any fractional solution x G -P(^) where J-" are either all subsets, or independent 
sets in a matroid, or matroid bases, can be rounded to an integral solution S £ J^ such that f{S) > F{x). Our 
algorithm can be seen as a new way of approximately solving the relaxed problem max{F(x) : x € P{T)}. 

Simulated annealing. The idea of simulated annealing comes from physical processes such as gradual 
cooling of molten metals, whose goal is to achieve the state of lowest possible energy. The process starts at 
a high temperature and gradually cools down to a "frozen state". The main idea behind gradual cooling 
is that while it is natural for a physical system to seek a state of minimum energy, this is true only in a 
local sense - the system does not have any knowledge of the global structure of the search space. Thus a 
low-temperature system would simply find a local optimum and get stuck there, which might be suboptimal. 
Starting the process at a high temperature means that there is more randomness in the behavior of the 
system. This gives the system more freedom to explore the search space, escape from bad local optima, and 
converge faster to a better solution. We pursue a similar strategy here. 

We should remark that our algorithm is somewhat different from a direct interpretation of simulated 
annealing. In simulated annealing, the system would typically evolve as a random walk, with sensitivity 
to the objective function depending on the current temperature. Here, we adopt a simplistic interpretation 
of temperature as follows. Given a set A C X and t € [0, 1], we define a probability distribution TZt{A) 
by starting from A and adding/removing each element independently with probability t. Instead of the 
objective function evaluated on A, we consider the expectation over the distribution Tlt{A). This corresponds 
to the noise operator used in the analysis of boolean functions, which was implicitly also used in the 2/5- 
approximation algorithm of [9]. Observe that E [f {TZt{A))] — F{{1 — t)lA + il^)> where F is the multilinear 
extension of /. The new idea here is that the parameter t plays a role similar to temperature - e.g., t — 1/2 
means that TZt{A) is uniformly random regardless oi A ("infinite temperature" in physics), while i = means 
that there are no fluctuations present at all (" absolute zero" ) . 

We use this interpretation to design an algorithm inspired by simulated annealing: Starting from t — 1/2, 
we perform local search on A in order to maximize E [f{TZt{A))]. Note that for t = 1/2 this function does 
not depend on A at all, and hence any solution is a local optimum. Then we start gradually decreasing t, 
while simultaneously running a local search with respect to E [f{Tlt{A))]. Eventually, we reach t — where 
the algorithm degenerates to a traditional local search and returns an (approximate) local optimum. 

We emphasize that we maintain the solution generated by previous stages of the algorithm, as opposed 
to running a separate local search for each value of t. This is also used in the analysis, whose main point 
is to estimate how the solution improves as a function of t. It is not a coincidence that the approximation 
provided by our algorithm is a (slight) improvement over previous algorithms. Our algorithm can be viewed 
as a dynamic process which at each fixed temperature t corresponds to a certain variant of a previous 
algorithm. We prove that the performance of the simulated annealing process is described by a differential 
equation, whose initial condition can be related to the performance of a previously known algorithm. Hence 
the fact that an improvement can be achieved follows from the fact that the differential equation yields 
a positive drift at the initial point. The exact quantitative improvement depends on the solution of the 
differential equation, which we also present in this work. 

Notation. In this paper, we denote vectors consistently in boldface: for example x, y e [0,1]". The 
coordinates of x are denoted by xi, . . . ,Xn- Subscripts next to a boldface symbol, such as Xo,Xi, denote 
different vectors. In particular, we use the notation Xp(A) to denote a vector with coordinates Xi — p for 
i G A and Xi = I — p ior i ^ A. In addition, we use the following notation to denote the value of certain 
fractional solutions: ^ j^ 

A \ P \ P' \ '■= PiP'^AnC + p''i-A\c + q'i-Bnc + g'lB\c)- 
B \ q \ q' 

For example, ii p = p' and q = q' = 1 — p, the diagram would represent F{'x.p{A)). Typically, A will be our 
current solution, and C an optimal solution. Later we omit the symbols A, B, C, C from the diagram. 



3 Unconstrained Submodular Maximization 

Let us describe our algorithm for unconstrained submodular maximization. We use a parameter p G [57 1]j 
which is related to the "temperature" discussed above by p = 1 — t. We also use a fixed discretization 
parameter S — 1/tt' . 

Algorithm 1 Simulated Annealing Algorithm For Submodular Maximization 

Input: A submodular function / : 2"^ — >■ R+. 

Output: A subset A C A satisfying f{A) > 0.41 • max{/(S') : S C X}. 

1: Define Xp(A) = pl^ + (1 - p)l-j. 

2: A^%. 

3: for p ^ 1/2; p <1; p ^ p + 6 do 

4: while there exists i G X such that F{xp{AA{i})) > F{xp{A)) do 

5: A^ AA{i} 

6: end while 

7: end for 

8: return the best solution among all sets A and A encountered by the algorithm. 

We remark that this algorithm would not run in polynomial time, due to the complexity of finding a local 
optimum in Step 4-6. This can be fixed by standard techniques (as in [H [131 [Ml IM] ) ) by stopping when the 
conditions of local optimality are satisfied with sufficient accuracy. We also assume that we can evaluate the 
multilinear extension F, which can be done within a certain desired accuracy by random sampling. Since 
the analysis of the algorithm is already quite technical, we ignore these issues in this extended abstract and 
assume instead that a true local optimum is found in Step 4-6. 

Theorem 3.1. For any submodular function f : 2^ — > M+, Algorithm [Ji returns with high probability a 
solution of value at least 0.41 • OPT where OPT — maxscx f {S) . 



In Theorem lC.il we also show that Algorithm [T] does not achieve any factor better than 17/35 ~ 0.486. 
First, let us give an overview of our approach and compare it to the analysis of the 2/5-approximation in 
[9]. The algorithm of [9] can be viewed in our framework as follows: for a fixed value of p, it performs local 
search over points of the form Xp(j4), with respect to element swaps in A, and returns a locally optimal 
solution. Using the conditions of local optimality, F{'Kp{A)) can be compared to the global optimum. Here, 
we observe the following additional property of a local optimum. If Xp (A) is a local optimum with respect to 
element swaps in A, then slightly increasing p cannot decrease the value of F{xp{A)). During the local search 
stage, the value cannot decrease either, so in fact the value of F{xp{A)) is non-decreasing throughout the 
algorithm. Moreover, we can derive bounds on ■4-F{x.p{A)) depending on the value of the current solution. 
Consequently, unless the current solution is already valuable enough, we can conclude that an improvement 
can be achieved by increasing p. This leads to a differential equation whose solution implies Theorem 13.11 

We proceed slowly and first prove the basic fact that if Xp(j4) is a local optimum for a fixed p, we cannot 
lose by increasing p slightly. This is intuitive, because the gradient \/F at Xp(A) must be pointing away 
from the center of the cube [0, l]'^, or else we could gain by a local step. 

Lemma 3.2. Let p G [5,1] dnd suppose Xp(yl) is a local optimum in the sense that F{xp{AA{i})) < 
F(xp(A)) for alii. Then 

• M->OtfieA,and§§-<Oifi(^A, 

Proof: We assume that fiipping the membership of element i in A can only decrease the value of F{xp{A)). 
The effect of this local step on Xp(A) is that the value of the i-th coordinate changes from p to 1 — p 01 vice 



versa (depending on whether z is in A or not). Since F is hnear when only one coordinate is being changed, 

dF 

dxi 

9F(xp(A)) _ " 9F d(xp(A)), 



this impUes ^^ > if i G A, and W- < if i ^ A. By the chain rule, we have 



dp 



z— 1 



dp 



Since (xp(A))i = p \i i ^ A and 1 — p otherwise, we get 
conditions above. 



dF(^^(A)) 



= E 



dF 

ieA dxi 



E^^A 9^7 > using the 

D 



In the next lemma, we prove a stronger bound on the derivative ■^F{Xp{A)) which will be our main tool 
in proving Theorem 13.11 This can be combined with the analysis of fO* to achieve a certain improvement. 
For instance, [9] implies that if A is a local optimum for p = 2/3, we have either f{A) > ^OPT, or 
F{xp{A)) > ^OPT. Suppose we start our analysis from the point p = 2/3. (The algorithm does not need to 
be modified, since a.t p = 2/3 it finds a local optimum in any case, and this is sufiicient for the analysis.) We 
have either f(A) > ^OPT or F{Kp{A)) > ^OPT, or else by the following lemma, -^F{xp{A)) is a constant 



fraction of OPT: 



d 



F{Kp{A)) >OPT (1--- 



1 



1 



= T5^^^- 



1 

3 • Qp- v-Fv--^^ - \^- 5 3 3^ 

Therefore, in some (5-sized interval, the value of F{xp{A)) will increase at a slope proportional to OPT. 
Thus the approximation factor of Algorithm [T] is strictly greater than 2/5. We remark that we use a different 
starting point to achieve the factor of 0.41, and we defer the precise analysis to Appendix [Bl 

Lemma 3.3. Let OPT = maxscxf{S), p G [^, 1] and suppose Xp(A) is a local optimum in the sense that 
F{xp{AA{i})) < F{y:p{A)) for all i. Then 

(1 - P) • ^^(%(^)) > OPT - 2F(xp(A)) - (2p - l)f(A). 

Proof: Let C denote an optimal solution, i.e. /(C) = OPT. Let A denote a local optimum with respect 
to F(xp(A)), and B = A its complement. In our notation using diagrams, 

F{xpiA))^F{plA + {l-p)lB) 

The top row is the current solution A, the bottom row is its complement B, and the left-hand column is the 
optimum C. We proceed in two steps. Define 
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G(x) = (lc-x)-VF(x) 
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to denote the derivative of F when moving from x towards the actual optimum Ic. By Lemma 13.21 we have 



(1-p) 



dp 
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(I-P) 



a-p) 




^ dx 

ieBnc 



= G(xp(^)) 



using the definition of Xp(A) and the fact that ^ > for z e ^ \ C and ||^ < for i e S n C. 

Next, we use Lemma [A. II to estimate G{xp{A)) as follows. To simplify notation, we denote Xp(A) simply 
by X. If we start from x and increase the coordinates in. AdC hy {1—p) and those in BOC hy p, Lemma fA. II 
says the value of F will change by 
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Similarly, if we decrease the coordinates in ^ \ C by p and those in U \ C by 1 — p, the value will change by 
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Adding inequalities dH), ([2]) and noting the expression for G(x) above, we obtain: 

< G(x). 
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It remains to relate the LHS of equation ([3]) to the value of OPT. We use the "threshold lemma" (see 
Lemma |A.3| and the accompanying example with equation ([S])): 
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Combining these inequalities with ([3]), we get 
G(x) > 2{l ~p)OPT -2 
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Recall that F(x) = 



1-p 



1-p 



(2p-l) 



Finally, we add (2p - l)f{A) = {2p - 1) 
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we can use submodularity to take advantage of the last two terms: 
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to this inequality, so that 



G(x) + (2p - 1)/(A) > 2{l-p)OPT-2 



p 


p 


1-p 


1-p 



+ (2p - 1) 



1 


1 


1 






1 


















1 


1 



> 2{1 -p)OPT - 2F(xp(^)) -I- {2p - 1)0PT = OPT - 2F{-Kp{A)). 
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We have proved that unless the current solution is already very valuable, there is a certain improvement 
that can be achieved by increasing p. The next lemma transforms this statement into an inequality describing 
the evolution of the simulated-annealing algorithm. 

Lemma 3.4. Let A{p) denote the local optimum, found by the simulated annealing algorithm, at temperature 
t = 1 — p, and let ^(p) — F{ji.p{A{p))) denote its value. Assume also that for all p, we have f{A{p)) < j3. 
Then 

i— ^($(p + (5)-$(p)) > {l-25n^)OPT-2<^{p)-{2p-l)p. 



Proof: Here we combine the positive drift obtained from decreasing the temperature (described by 
Lemma 13. 3p and from local search (which is certainly nonnegative) . Consider the local optimum A ob- 
tained at temperature t — 1 — p. Its value is <&(p) = F[yip{Ay). By decreasing temperature by (5, we obtain 
a solution 'K.p-\-s{A), whose value can be estimated in the first order by the derivative at p (see Lemma lA. 21 
for a precise argument): 



F{^p+s{A))>F{^p{A))+6 



dF{^p{A)) ^2^2 



dp 



— S n sup 



d^F 



dxidxj 



This is followed by another local-search stage, in which we obtain a new local optimum A' . In this stage, the 
value of the objective function cannot decrease, so we have ^{p + 5) = F{xp+s{A')) > F{xp+s{A)). We have 



supl o|;£-| < maxs..,,|/(5 + i + j)- f{S + z) - f{S + j) + !{S)\ < 20PT. We also estimate |;F(xp(A)) 
using Lemma 13.21 to obtain 



^ ~" ~ ^-— ~.2„2, 



$(p + S)> F{^p+siA)) > F{^p{A)) + .^—{OPT ~ 2F(xp(A)) - (2p - 1)/(A)) - 2<5^n^ OPT. 
Finally, we use f{A) < (3 and F{xp{A)) — $(p) to derive the statement of the lemma. D 



We only sketch the remainder of the analysis. By taking (5 — > 0, the statement of Lemma 13.41 leads 
naturally to the following differential equation: 

(1 - p)<^'ip) > OPT - 2$(p) - {2p - l)/3. 

This equation can be solved analytically. Starting from initial condition $(po) — vq, we get for any p > pq: 

^P) >l{l-P) + 2/3(1 -p)- JY-7^ (^(1 -P) + 2/5(1 - P^) - ^0 

Choosing the starting point is a non-trivial issue; for example po — 1/2 and vq — 1/4 (the uniformly random 
approximation of [9]) does not give any improvement over 2/5. It turns out that the best choice is po = •^ , 
even though the corresponding value Vq is less than 2/5. We prove that we can pick a value /3 > 0.41 such 
that the solution of the differential equation starting at po = -rj^ reaches a point pi such that $(pi) > /3. 
Details can be found in Appendix |B] 

4 Matroid Independence Constraint 

Let Ai — {X,X) be a matroid. We design an algorithm for the case of submodular maximization subject to a 
matroid independence constraint, max{/(S') : S £ I}, as follows. The algorithm uses fractional local search 
to solve the optimization problem max{F(a;) : x G Pt{A4)}, where Pt{Ai) = P{Ai) [0,^]^^ is a matroid 
polytope intersected with a box. This technique, which has been used already in ^34^ . is combined with a 
simulated annealing procedure, where the parameter t is gradually being increased from to 1. (The analogy 
with simulated annealing is less explicit here; in some sense the system exhibits the most randomness in 
the middle of the process, when t — 1/2.) Finally, the fractional solution is rounded using pipage rounding 
[H [31] ; we omit this stage from the description of the algorithm. 

The main difficulty in designing the algorithm is how to handle the temperature-increasing step. Contrary 
to the unconstrained problem, we cannot just increment all variables which were previously saturated at 
Xi = t, because this might violate the matroid constraint. Instead, we find a subset of variables that can be 
increased, by reduction to a bipartite matching problem. We need the following definitions. 

Definition 4.1. Let he an extra element not occurring in the ground set X , and define formally 4^ = 0. 

Forx= jj- YTi^i ^h a"^ « i h, we define hi(i) = argmin^g^^LJioli/.-j+ieillj- 

In other words, bi{i) is the least valuable element which can be exchanged for i in the independent set 
If. Note that such an element must exist due to matroid axioms. We also consider bi{i) = as an option in 
case li + i itself is independent. In the following, can be thought of as a special "empty" element, and the 
partial derivative 4|- is considered identically equal to zero. By definition, we get the following statement. 

Lemma 4.2. For bi{i) defined as above, we have §^ - -Q§f^ = maXj-g7^u{o}:/f-j+iei ( ^ - ^j • 
The following definition is important for the description of our algorithm. 



Definition 4.3. For x = ly- X^fci l^f' ^^^ A = {i : Xi = t}. We define a bipartite "fractional exchange 
graph" Gx on AU [N] as follows: We have an edge {i,t} € E, whenever i ^ Ig. We define its weight as 



dF dF f dF dF 



Wi 



a^ a^-"'^"^"^^^^^">^^-^"+'^^Vax,: dx. 



We remark that the vertices of the bipartite exchange graph are not elements of X on both sides, but 
elements on one side and independent sets on the other side. Now we can describe our algorithm. 

Algorithm 2 Simulated Annealing Algorithm for a Matroid Independence Constraint 
Input: A submodular function f : 2^ —> M.^ and a matroid A4 — {X.I). 
Output: An independent set ^ e I such that f{A) > 0.325 • max{/(5) : S el}. 
1: Let X ^ 0, iV ^ n^ and 6 ^ 1/N. 
Define Pt{M) = P{M) n [0,t]^ 

Maintain a representation of x = -^y X]f=i '^h where Ig G X. 
fort^O; i< 1; t ^ t + S do 

while there is v G {±6,, e^ — e^ : i,j ^ X} such that x + (5v e Pt{A4) and F{x + Sv) > F{x.) do 

X := X + (5v {Local search} 
end while 

for each of the n possible sets r<A(x) = {i : Xi < A} do {Complementary solution check} 
9: Find a local optimum B C r<A(x), B (£ X trying to maximize f{B). 

10: Remember the largest i? as a possible candidate for the output of the algorithm 

11: end for 

12: Form the fractional exchange graph (see Definition 14. 3p and find a max- weight matching M. 
13: Replace Ii by Ii — bg{i) + i for each edge {i,tj € M, and update the point x = j^Y^g^i'^if 
{Temperature relaxation: each coordinate increases by at most 5 = \/N and hence x € Pt+s{M.) .} 
14: end for 
15: return the best encountered solution. 

Theorem 4.4. For any submodular function f : 2 — > K-|- and matroid A4 = {X,I), Alg or ithm\^ returns 
with high probability a solution of value at least 0.325 • OPT where OPT ~ max5gx/(5'). 

Let us point out some differences between the analysis of this algorithm and the one for unconstrained 
maximization (Algorithm [1]). The basic idea is the same: we obtain certain conditions for partial derivatives 
at the point of a local optimum. These conditions help us either to conclude that the local optimum already 
has a good value, or to prove that by relaxing the temperature parameter we gain a certain improvement. 
We will prove the following lemma which is analogous to Lemma [331 

Lemma 4.5. Let x(t) denote the local optimum found by Algorithm\^ at temperature i < 1 — 1/n right after 
the "Local search" phase, and let $(i) — i^(x(i)) denote the value of this local optimum. Also assume that 
the solution found in "Complementary solution check" phase of the algorithm (Steps 8-10) is always at most 
/3. Then the function $(t) satisfies 

^-— ^($(t + 5)- $(i)) > (1 - 2dn^)OPT - 2$(t) - 2l3t. (4) 



We proceed in two steps, again using as an intermediate bound the notion of derivative of F on the line 
towards the optimum: G'(x) = (l^ — x) • VF(x). The plan is to relate the actual gain of the algorithm in 
the "Temperature relaxation" phase (Steps 12-13) to G(x), and then to argue that G'(x) can be compared 
to the RHS of ^. The second part relies on the submodularity of the objective function and is quite similar 
to the second part of Lemma 13.31 (although slightly more involved) . 

The heart of the proof is to show that by relaxing the temperature we gain an improvement at least 
■^^G{x). As the algorithm suggests, the improvement in this step is related to the weight of the matching 



obtained in Step 12 of the algorithm. Thus the main goal is to prove that there exists a matching of weight 
at least j—iG{x). We prove this by a combinatorial argument using the local optimality of the current 
fractional solution, and an application of Konig's theorem on edge colorings of bipartite graphs. We defer 
all details of the proof to Appendix [D] 

Finally, we arrive at a differential equation of the following form: 

(1 - t)<P'{t) > OPT - 2$(i) - 2t(3. 

This differential equation is very similar to the one we obtained in Section [3] and can be solved analytically 
as well. We start from initial conditions corresponding to the 0.309-approximation of [S^, which implies 
that a fractional local optimum at ip = ^(3 — "v/S) has value vq > ^{1 — tg) ~ 0.309. We prove that there is 
a value /? > 0.325 such that for some value of t (which turns out to be roughly 0.53), we get $(i) > /3. We 
defer details to Appendix [Dl 

5 Hardness of approximation 

In this section, we improve the hardness of approximating several submodular maximization problems subject 
to additional constraints (i.e. max{/(S') : S G J^}), assuming the value oracle model. We use the method 
of symmetry gap |34| to derive these new results. This method can be summarized as follows. We start 
with a fixed instance max{/(S') : S G .F} which is symmetric under a certain group of permutations of the 
ground set X. We consider the multilinear relaxation of this instance, max{i^(x) : x e P{J-)}. We compute 
the symmetry gap 7 = OPT/OPT, where OPT = max{_F(x) : x G P{J-)} is the optimum of the relaxed 
problem and OPT — max{F(x) : x G P(J^)} is the optimum over all symmetric fractional solutions, i.e. 
satisfying cr(x) = x for any a € Q. Due to |341 Theorem 1.6], we obtain hardness of (1 + e)7-approximation 
for a class of related instances, as follows. 

Theorem 5.1 ([34]). Let niax{/(5) : S G J-} be an instance of a nonnegative submodular maximization 
problem with symmetry gap 7 — OPT /OPT. Let C be the class of instances max{/(5) : S G T} where f is 
nonnegative submodular and J- is a "refinement" of !F. Then for every e > 0, any (1 + e)^- approximation 
algorithm for the class of instances C would require exponentially many value queries to f{S). 

For a formal definition of "refinement", we refer to [M, Definition 1.5]. Intuitively, these are "blown-up" 
copies of the original family of feasible sets, such that the constraint is of the same type as the original 
instance (e.g. cardinality, matroid independence and matroid base constraints are preserved). 

Directed hypergraph cuts. Our main tool in deriving these new results is a construction using a variant 
of the Max Di-cut problem in directed hypergraphs. We consider the following variant of directed hypergraphs. 

Definition 5.2. A directed hypergraph is a pair H = (X, E), where E is a set of directed hyperedges ([/, v), 
where U d X is a non-empty subset of vertices and v ^ U is a vertex in X . 

For a set S C X , we say that a hyperedge {U, v) is cut by S, or (U, v) G S{S), if U C\ S ^ % and v ^ S. 

Note that a directed hyperedge should have exactly one head. An example of a directed hypergraph is 
shown in Figure [5] We will construct our hard examples as Max Di-cut instances on directed hypergraphs. 
It is easy to see that the number (or weight) of hyperedges cut by a set S is indeed submodular as a function 
of S. Other types of directed hypergraphs have been considered, in particular with hyperedges of multiple 
heads and tails, but a natural extension of the cut function to such hypergraphs is no longer submodular. 

In the rest of this section, we present our hardness result for maximizing submodular functions subject 
to a matroid base constraint. We defer the remaining results to Appendix lEl 

Theorem 5.3. There exist instances of the problem max{/(5) ; S G B}, where f is a nonnegative submodular 
function, B is a collection of matroid bases of packing number at least 2, and any {l — e^^''^ + e)- approximation 
for this problem would require exponentially many value queries for any e > 0. 



B 




Figure 2: Example for maximizing a submodular function subject to a matroid base constraint; the objective 
function is a directed hypergraph cut function, and the constraint is that we should pick exactly 1 element 
of A and 1 element of B. 



We remark that 1 — e ^/^ < 0.394, and only hardness of (0.5 + e)-approximation was previously known 
in this setting. 

Instance 1. Consider the hypergraph in Figure [U with the set of vertices X — AU B and two hyperedges 
({ai, . . . , Cfc}, a) and ({&i, . . . , 6fc}, b). Let / be the cut function on this graph, and let J^a,b be a partition 
matroid whose independent sets contain at most one vertex from each of the sets A and B. Let Ba,b be the 
bases of A^^^b (i.e. Ba,b = {S : \S H A\ = 1 k \S r\ B\ = 1}). Note that there exist two disjoint bases in this 
matroid and the base packing number of M is equal to 2. An optimum solution is for example S = {a, bi} 

with OPT = 1. 

In order to apply Theorem 15.11 we need to compute the symmetry gap of this instance 7 = OPT/OPT. 
We remark in the blown-up instances, OPT corresponds to the maximum value that any algorithm can 
obtain, while OPT = 1 is the actual optimum. The definition of OPT depends on the symmetries of our 
instance, which we describe in the following lemma. 

Lemma 5.4. There exists a group Q of permutations such that Instance 1 is symmetric under Q , in the 
sense that Vcr S Q; 



f{S) = f{a{S)), SeBA,B^(T{S)^BAM- 



(5) 



Moreover, for any two vertices i,j(^A (or B), the probability that a{i) = j for a uniformly random a ^ Q 
is equal to 1/\A\ (or l/\B\ respectively). 

Proof: Let 11 be the set of the following two basic permutations 



n 



(Ti : cri(a) = b,ai{b) = a, cri(a,j) = bi,ai{bi) ^ a., 

(72 : 0-2(0) = 0,0-2(6) = 6,0-2(0^) = a(j mod fc)+i 7 0-2(60 = bi 



where oi swaps the vertices of the two hyperedges and 0-2 only rotates the tail vertices of one of the hy- 
peredges. It is easy to see that both of these permutations satisfy equation ([5]). Therefore, our instance is 
invariant under each of the basic permutations and also under any permutation generated by them. Now let 
Q be the set of all the permutations that are generated by 11. Q is a group and under this group of symmetries 
all the elements in A (and B) are equivalent. In other words, for any three vertices i,j, k G A{B), the number 
of permutations a Cz G such that a{i) — j is equal to the number of permutations such that a{i) = k. D 

Using the above lemma we may compute the symmetrization of a vector x e [0, 1]"^ which will be useful 
in computing OPT 34 . For any vector x € [0, 1]'''" , the "symmetrization of x" is: 



E, 



tree 



[o(x)] 



^a *^6 2 \*^CL ^ •^b) 



•^ak '^fei ... — '^hk 



^ 2k L^i^iy-^ai + ^bi)j 



(6) 
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where cr(x) denotes x with coordmates permuted by <t. Now we are ready to prove Theorem 



Proof: [Theorem I5.3J We need to compute the value of symmetry gap 7 = OPT = max{F(x) : x G 
P{Ba,b)}, where F is the multihnear relaxation of / and P{Ba.b) is the convex hull of the bases in Ba,b- 
For any vector x G [0, 1]"^, we have 

xGP(S^.B)<^fc + 7 = ' , ^ (7) 

By equation ([6]) we know that the vertices in each of the sets A, B have the same value in x. Using equation 
([7|, we obtain Xa — Xb = \ and Xa = Xb- = ^ for all 1 < z < fc, which yields a unique symmetrized solution 

Now we can simply compute OPT — F{^,^,j^,...,j^). Note that by definition a hyperedge will be 
cut by a random set S if and only if at least one of its tails are included in S while its head is not included. 
Therefore 



111 1 



^^^^^l2'2'2fc'-'2A;'-^ 




l-e"2. 



for sufficiently large k. By applying Theorem 15. 11 it can be seen that the refined instances are instances of 
submodular maximization over the bases of a matroid where the ground set is partitioned into AU B and we 
have to take half of the elements of A and jr fraction of the elements in B. Thus the base packing number 
of the matroid in the refined instances is also 2 which implies the theorem. D 
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A Miscellaneous Lemmas 

Let F be the multilinear extension of a submodular function. The first lemma says that if we increase 
coordinates simultaneously, then the increase in F is at most that given by partial derivatives at the lower 
point, and at least that given by partial derivatives at the upper point. 



Lemma A.l. If F : [0,1]-^ 

y > 0, then 



Similarly, 



is the multilinear extension of a submodular function, and x' > x where 

r)F 

F(x')<i^(x) + 5^(x:-x.) — 



iex 



dxi 



OF 



F{^')>Fi^) + J2ix',-x,) — 



iex 



dxi 



Proof: Since F is the multilinear extension of a submodular function, we know that a a < for all i, j 

' OXiOXj — ' •^ 

[4]. This means that whenever x < x', the partial derivatives at x' cannot be larger than at x: 

dF 



dF 

dxi 



> 



dxi 



Therefore, between x and x', the highest partial derivatives are attained at x, and the lowest at x'. By 
integrating along the line segment between x and x', we obtain 



F(x') - F(x) = / (x' - x) . VF(x + t(x' - x))di = ^ f\x', 
Jo i^x •'0 



Xi) 



dF 

dx,. 



x-l-i(x'— x) 



dt. 



If we evaluate the partial derivatives at x instead, we get 



dF 



Fi^')-Fi^)<J2{x'^~x,)^ 



iex 



If we evaluate the partial derivatives at x', we get 



dxi 



dF 



Fi^')-F{^)>J2ix',-x,) — 



iex 



dxi 



D 



For a small increase in each coordinate, the partial derivatives give a good approximation of the change 
in F; this is a standard analytic argument, which we formalize in the next lemma. 
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Lemma A.2. Let F : [0, 1] 



X 



be twice differentiahle, x G [0, 1] and y G [— (5, 5] . Th 



dF 



Fi^ + y)-F{^)-J2y.— 



iex 



< S^n^ sup 



d^F 



dxjdxi 



where the supremum is taken over all i,j and all points in [0, 1]'''^. 

Proof: Let M — sup| gf .J!^. |- Since F is twice difFerentiable, any partial derivative can change by at most 
SM when a coordinate changes by at most 6. Hence, 



dF 

-SnM < —- 

oxi 



x+ty 



dF 
dxi 



< SnM 



for any t G [0, 1]. By the fundamental theorem of calculus, 



("1 ap { f)F \ 

^(x + y)=^(x) + V / y,— di<F(x) + V2;J— + feAf < F(x) + V y. 



9F 



Similarly we get i^(x + y) > F(x) + ^^^^ y,;^ 



^M. 



Sp-n^M. 



D 



The following "threshold lemma" appears as Lemma A. 4 in _34 . We remark that the expression 
E [/(r>A(x))] defined below is an alternative definition of the Lovasz extension of /. 

Lemma A. 3 (Threshold Lemma). For y G [0, 1]^ and A G [0, 1], dejine T-^xiy) = {i ■ Vi > A}. If F is the 
multilinear extension of a submodular function f , then for A G [0, 1] uniformly random 

F(y) > E [/(r>A(y))] . 

Since we apply this lemma in various places of the paper let us describe some applications of it in detail. 

Example A. 4. In this example we apply the threshold lemma to the vector x = plAnc + (1 ~p)lBnc- Here 
C represents the optimum set, B ~ A and 1/2 < p < 1. //A G [0, 1] is chosen uniformly at random we know 

< A < 1 — p with probability 1— p, \ — p < \ < p with probability 2p — 1 and p < X < 1 with probability 

1 — p. Therefore by Lemma \A.3\ we have: 

F(x) > (1 - p)E [/(r>A(x))|A <l-p] + {2p- 1)E [/(T>a(x))|1 - p < A < p] + (1 -p)E [/(T>A(x))b < A < 1] 
= (1 - p)E [/(C)] + (2p - 1)E [f{A n C)] + (1 - p)E [/(0)] 



or equivalently we can write 



p 





1-p 






>(l-p) 



1 





1 






+ (2p - 1 



) 


1 












+ {l-p) 
















In the next example we consider a more complicated application of the threshold lemma. 



(8) 



Example A. 5. Consider the vector x where Xi — 1 for i ^ C , Xi — t for i £ A\C and Xi < t for i G B\C . 
In this case, we denote 

F(x) 

Again C is the optimal set and B — A. In this case if we apply the threshold lemma, we get a random set 
which can contain a part of the block B\C. In particular, observe that if X <t, then T>a(x) contains all the 



1 


t 


1 


X 
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elements in B\C , and depending on the value of X, elements in B\C that are greater than A. We denote 
the value of such a set by 

/(7^>a(x)) 

where the right-hand lower block is divided into two parts depending on the threshold X. Therefore 

F(x) > t E [/(r>A(x))|A < t] + (1 - t)E [/(r>A(x))|A > t] , 
can be written equivalently as 



1 


1 


1 


1 





1 


t 


1 


X 



> t E 



1 


1 


1 


1 





X < t 



+ (!-<) 



1 





1 






(9) 



A further generalization of the threshold lemma is the following, which is also useful in our analysis. (See 
[5il Lemma A. 5].) 

Lemma A. 6. For any partition AT = ATi U X2, 

F(x) > E [/((r>A,(x) n Xi) u (T>A,(x) n X2))] 

where Ai,A2 are independent and uniformly random in [0,1]. 



B Analysis of the 0.41-approxiniation 

Here we finish the analysis of the simulated annealing algorithm for unconstrained submodular maximization 
(Theorem 13. ip . Consider Lemma [3^ in the limit when 5 — > 0. It gives the following differential inequality: 



(1 - p)$'(p) > OPT - 2$(p) - {2p - l)/3. 



(10) 



We assume here that S is so small that the difference between the solution of this differential inequality 
and the actual behavior of our algorithm is negligible. (We could replace OPT by (1 — e)OPT, carry out 
the analysis and then let e — > 0; however, we shall spare the reader of this annoyance.) Our next step is to 
solve this differential equation, given certain initial conditions. Without loss of generality, we assume that 
OPT = 1. 

Lemma B.l. Assume that OPT = 1. Let $(p) denote the value of the solution at temperature t = 1 — p. 
Assume that $(po) = ''^o for some po £ (i, 1), and f{A{p)) < (3 for all p. Then for any p £ {po, I), 

Hp) > 1(1-13) + 2(3(1 -p)- ^T^ (^(1 -P) + 2/3(1 - po) - vo 
Proof: We rewrite Equation (fTO|) using the following trick: 



(1 - pf-^((l - p)-^-^(p)) = (1 - pf(2(l ~ p)-^^(p) + (1 - p)-2$'(p)) = 2<i(p) + (1 - p)<^'(p). 
dp 

Therefore, Lemma WM states that 



(1 - P)'^(P"'$(P)) > OPT - (2p - l)/3 - 1 - (2p - l)/3 = 1 - /3 + 2/3(1 - p) 



which is equivalent to 



-((l-p)-a>(,))>^ + ^. 
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For any p G {pq, 1), the fundamental theorem of calculus implies that 

1-13 2/3 



{l-p)-'^{p)-{l-po)-'^{po) > 



Pa 



(l-r)3 (1-r) 



1-/? _^ 2/3 T^ 



dT 



2(1 -r)2 '1-r 
1-/3 , 2/3 



Po 



1-/3 



Multiplying by (1 — p)^, we obtain 



2/3 



2(1 -p)2 1-p 2(l-po)2 1-po' 



Hp) > ^(1 - /3) + 2/3(1 - p) + j^-T^ (*(Po) - i(l - /?) - 2/3(1 - po) 



D 



In order to use this lemma, recall that the parameter /3 is an upper bound on the values of /(A) throughout 
the algorithm. This means that we can choose /3 to be our "target value": if f{A) achieves value more than 
/3 at some point, we are done. If f{A) is always upper-bounded by /3, we can use Lemma IB. 11 hopefully 
concluding that for some p we must have $(p) > /3. 

In addition, we need to choose a suitable initial condition. As a first attempt, we can try to plug in 
Po = 1/2 and vq — 1/4 as a starting point (the uniformly random 1/4-approximation provided by [9l). We 
would obtain 

Hp) > ^(1 - /3) + 2/3(1 - p) - (1 + 2/3)(l - p)2. 

However, this is not good enough. For example, if we choose /3 — 2/5 as our target value, we obtain 
^{p) > ^ + |(1 — p) — 1(1 — p)^- In can be verified that this function stays strictly below 2/5 for all 
p G [^7 !]• So this does not even match the performance of the 2/5-approximation of [S]. 

As a second attempt, we can use the 2/5-approximation itself as a starting point. The analysis of [5] 
implies that if A is a local optimum for po = 2/3, we have either f{A) > 2/5, or F{xp{A)) > 2/5. This 
means that we can use the starting point pq = 2/3, uo — 2/5 with a target value of /3 = 2/5 (effectively 
ignoring the behavior of the algorithm for p < 2/3). Lemma FB . 1 1 gives 

^p)>^+^ii-p)-'^ii-pr. 

The maximum of this function is attained at po = 11/15 which gives ^{po) > 61/150 > 2/5. This is a good 
sign - however, it does not imply that the algorithm actually achieves a 61/150-approximation, because we 
have used /3 = 2/5 as our target value. (Also, note that 61/150 < 0.41, so this is not the way we achieve our 
main result.) 

In order to get an approximation guarantee better than 2/5, we need to revisit the analysis of [5] and 
compute the approximation factor of a local optimum as a function of the temperature t = 1 — p and the 
complementary solution f{A) = /3. 

Lemma B.2. Assume OPT ^ I. Let q e [^ 

F{xp{A)). Let 13 ^f (A). Then 



3' l + v^J 



p = 1 — q and let A be a local optimum with respect to 



F{^piA))>-{l-q^)-q{l-2q)(3. 

Proof: A is a local optimum with respect to the objective function F{Xp{A)). We denote Xp(A) simply by 
X. Let C be a global optimum and B = A. As we argued in the proof of Lemma l373l we have 
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and also 
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We apply Lemma [Ql which states that F(x) > E [/((T>Ai(x) n C) U (T>a2(x) \ C))], where Ai,A2 are 
independent and uniformly random in [0, 1]. This yields the following (after dropping some terms which are 
nonnegative) : 
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(11) 

(12) 



The first term in each bound is pq ■ OPT . However, to make use of the remaining terms, we must add some 
terms on both sides. The terms we add are \i—p^ +p^q + 2pq'^)f{A) + ^ip^ +p^q — 2pq'^ — 2q^)fiB); it can 
be verified that both coefficients are nonnegative for g G [-j, 7^7^] • Also, the coefficients are chosen so that 
they sum up to p^q — q^ — q[p'^ — q^) — q[p — q), the coefficient in front of the last term in each equation. 
Using submodularity, we get 
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Similarly, we get 

i(-p3+p2g + 2pq2) 
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(13) 
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-p" + p'q + 2pq 

Putting equations ([TT]), ([12]) ([13]) and dUl) all together, we get 
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where the simplification came about by using the elementary relations pip — q) — pip ^ q)ip + q) — p(p^ ^ q^) 
and q^ ~ q'^ip + q). Submodularity implies 
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so we get, replacing the respective diagrams by F{x.), f{A) and f{B), 

2F{k) + {-p^+p\ + 2pq^)f{A) + {p^+p\ - 2pq^ - 2q^)f{B) > {2pq+p')OPT = (1 - q^)OPT 
again using {p + qY = 1- Finally, we assume that f{A) < (3 and f{B) < (3, which means 

2i^(x) > (1 - q^)OPT - {2p^q - 2q^)l3 = (1 - q^)OPT - 2q{p - g)/3 = (1 - q^)OPT - 2q{l - 2q)l3. 

D 

Now wc can finally prove Theorem 13.11 Consider Lemma iB.ll Starting from $(po) = ^Oi we obtain the 
following bound for any p € (po, 1): 

$b) >\{1-P) + 2/?(l -P)- llZp^)2 (^(1 -P) + 2/^(1 - ^'o) - va 



By optimizing this quadratic function, we obtain that the maximum is attained at pi ~ (i-b\ 12+28(1- — T^ — 
and the corresponding bound is 



2' '' (1 - /3)/2 + 2/3(1 - po) - vo 



Lemma IB . 2 1 implies that a local optimum at temperature 9 = 1 — Po G [-1, — ^] has value Vq> 5(1 — q^) — 
q{l — 2q)l3 = po — \p'i ^ (1 ^ Po)(2po ^ 1)/^- Therefore, we obtain 



2' ' (l-/3)/2 + 2/?(l-po)-Po + ^P^ + (l-Po)(2po-l)/3' 

We choose po — V^/(l + ^/2) and solve for a value of /3 such that $(pi) > /3- This value can be found as a 
solution of a quadratic equation and is equal to 



/3= -i- (37 + 22V2 + (3072 + 14)\/-5V2 + 10 



It can be verified that /3 > 0.41. This completes the proof of Theorem 13. II 

C Upper Bounding the Performance of the Simulated Anneahng 
Algorithm 

In this section wc show that the simulated annealing algorithm[l]for unconstrained submodular maximization, 
does not give a half approximation even on instances of the directed maximum cut problem. We provide a 
directed graph G (found by an LP-solver) and a set of local optimums for aU values of p S [1/2, 1], such the 
value of / on each of them or their complement is at most 0.486 of OPT. 

Theorem C.l. There exists an instance of the unconstrained submodular maximization problem, such that 
the approximation factor of AlgorithmUl is 17/35 < 0.486. 
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Figure 3: Hard instance of the unconstrained submodular maximization problem, where Algorithm [T] may 
get value no more than 17. The bold vertices {4, 5, 6, 7} represents the optimum set with value OPT — 35. 



Proof: Let / be the cut function of the directed graph G in Figure |31 We show that the set A = {1, 3, 5, 7} 
is a local optimum for all p e [\,j\ and the set B = {2,4,6,8} is a local optimum for all p G [|,1]. 
Moreover, since we have F {:x.^ j /^{A)) = F(x3/4(_B) = 16.25, it is possible that in a run of the simulated 
annealing algorithm [U the set A is chosen and remains as a local optimum fort p = 1/2 to p = 3/4. Then 
the local optimum changes to B and remains until the end of the algorithm. If the algorithm follows this 
path then its approximation ratio is 17/35. This is because the value of the optimum set /({4, 5, 6, 7}) = 35, 
while max{/(A), /(i?), /(A), /(_B)} — 17. We remark that even sampling from A, A (or from B,B) with 
probabilities p, q does not give value more than 17. 

It remains to show that the set A is in fact a local optimum for all p & [|i f]- We just need to show 
that all the elements in A have a non-negative partial derivative and the elements in A have a non-positive 
partial derivative. Let p e [9' f] ''^^'^ 9 — 1 ^P? then: 



§f- = -Uq + 4p<0 
g- = -3q + p<0 
|J = 15p-g-15p4 



g = 



Si? 



Si? 



llp+5q + llp 
= -p + 3q>0 
Ap+12q > 



5(7 ==0 



Therefore, ^ is a local optimum for p e [51 f] 
p e [1,1] which concludes the proof. 



Similarly, it can be shown that 5 is a local optimum for 

D 
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D Analysis of the 0.325-approxiniation 



Our first goal is to prove Lemma 14.51 As we discussed, the key step is to compare the gain in the temperature 
relaxation step to the value of the derivative on the line towards the optimum, G(x) — [\c — x) • VF(x). 
We prove the following. 

Lemma D.l. Let x(t) he the the local optimum at time t < \ ~ 1/n. Then 

i^(^(x(t + 6)) - F{^{t))) > G{j^{t)) - n^S snp ^ ^ 



dxidxj 

This lemma can be compared to the first part of the proof of Lemma |3.3[ which is not very complicated 
in the unconstrained case. As we said, the main difficulty here is that relaxing the temperature does not 
automatically allow us to increase all the coordinates with a positive partial derivative. The reason is that the 
new fractional solution might not belong to Pt+s{-M). Instead, the algorithm modifies coordinates according 
to a certain maximum-weight matching found in Step 12. The next lemma shows that the weight of this 
matching is comparable to G(x). 

Lemma D.2. Let x = -^ Sf=i l/f S Pt{A4) be a fractional local optimum, and C E I a global optimum. 
Assume that (1 — t)N > n. Let Gx he the fractional exchange graph defined in Def. \4.3\ Then Gx has a 
matching M of weight 

w{M) > Y3^G(x). 

Proof: We use a basic property of matroids (see [30^ ) which says that for any two independent sets C, / G X, 
there is a mapping to:C\/— >■ {I\C)U {0} such that for each i E C\ L, I — m(i) + i is independent, and 
each element of / \ G appears at most once as m{i). Le., ?7i is a matching, except for the special element 
which can be used as m(i) whenever / + i e Z. Let us fix such a mapping for each pair G, Ii, and denote 
the respective mapping by m^ : C\Ii -^ Ii\C. 

Denote by W the sum of all positive edge weights in Gx- We estimate W as follows. For each i e yl n G 
and each edge («, ^), we have i £ A{^C\Ii and by Lemma l42l 

Wit = 

Observe that for i € (G \ A) \ J^, we get 



because otherwise we could replace Ii by Ig — mi{i) + i, which would increase the objective function (and 
for elements outside of A, we have Xi < t, so Xi can be increased). Let us add up the first inequality over all 
elements i G AnC\ Lg and the second inequality over all elements i G {C \ A) \ L^: 



dF dF dF 


dF 


dxi dxb^(^i-) ~ dxi 


9x„,(i) 


dXi dXmi{r) 





ieAnc\h iec\h ™fW/ iecXh jeh\c ^ 

where we used the fact that each element oi Ie\C appears at most once as me{i), and ^ > for any 
element j € If; (otherwise we could remove it and improve the objective value). Now it remains to add up 
these inequalities over all £ = 1, . . . , A^: 

£=1 ifrAnc\ii i=i \iec\h jeie\c ■' / tec j^c ■' 
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using Xi = X^iie/ Iv' ^^^ left-hand side is a sum of weights over a subset of edges. Hence, the sum of all 
positive edge weights also satisfies 

BK BF 

Finally, we apply Konig's theorem on edge colorings of bipartite graphs: Every bipartite graph of maximum 
degree A has an edge coloring using at most A colors. The degree of each node i e A is the number of sets 
I( not containing i, which is (1 — t)N, and the degree of each node £ € [N] is at most the number of elements 
n, by assumption n < (1 — t)N. By Konig's theorem, there is an edge coloring using (1 — t)N colors. Each 
color class is a matching, and by averaging, the positive edge weights in some color class have total weight 

W 1 

yj(M) > , \ > G(x). 

^ ' - {I - t)N - 1 - t ^ ' 

D 

The weight of the matching found by the algorithm corresponds to how much we gain by increasing the 
parameter t. Now we can prove Lemma [P. II 

Proof: [Lemma Id. 1| Assume the algorithm finds a matching AI. By Lemma fP. 2 1 its weight is 

(,,«./ van Sn,taJ l-( 

If we denote by x(t) the fractional solution right after the "Temperature relaxation" phase, we have 

x{t)=x{t)+S ^ (e, -ef,,(j)). 

Note that x(i + S) is obtained by applying fractional local search to x(i) . This cannot decrease the value of 
F, and hence 

F{x{t + 5)) - F(x(i)) > F{i{t)) - F{x{t)) = F I x(t) +6 ^ (e, - e^^f,)) j - F(x(i)). 

Observe that up to first-order approximation, this increment is given by the partial derivatives evaluated at 
x(i). By Lemma rA.2l the second-order term is proportional to S^: 

F{x{t + S))-F{x{t))>5 Y. f 1^ - TT^) - '^''^'^'^P 

and from above, 

F(x(t + 5)) - F(x(i)) > ^G(x(t)) - n^hup\^£- 



d^F 



dxidxj 
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It remains to relate G'(x(t)) to the optimum (recall that OPT = f{C)), using the complementary solutions 
found in Step 9. In the next lemma, we show that G(x) is lower bounded by the RHS of equation (j4]). 

Lemma D.3. Assume OPT = f{C), x £ Pt{A4), T<a(x) = {i : Xi < A}, and the value of a local optimum 
on any of the subsets T<a(x) is at most /3. Then 

G(x(t)) > OPT - 2F(x) - 2/3t. 
21 



Proof: Submodularity means that partial derivatives can only decrease when coordinates increase. There- 
fore by Lemma [A. 11 
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and similarly 
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Combining these inequalities, we obtain 
2F(x(t)) + G(x(t)) = 2 
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(15) 



Let A = \i : Xi^t] (and recall that x, e [0,t] for all i). By applying the treshold lemma (see Lemma [A. 31 
and the accompanying example with equation ([S])), we have: 



1 


t 


1 


X 



>t E 



1 


1 


1 


1 





A <i 



+ (l-i) 



1 





1 






(16) 



By another application of Lemma IA.3[ 
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A < t 



(17) 



(We discarded the term conditioned on A > t, where T>a(x) — 0.) It remains to combine this with a suitable 
set in the complement of T>a(x). Let Sk. be a local optimum found inside r<K(x) = T>a(x). By Lemma 2.2 
in [23], J{Sk) can be compared to any feasible subset of r<K(x), e.g. C^ = Cn T<k(x), as follows: 

2/(^k) > /(5k U Ck) + /(5k n Ck) > /(5k U Ck) = /(5k U {C \ T>k(x))). 

We assume that J{Sk) < /3 for any k. Let us take expectation over A G [0, 1] uniformly random: 

2/3 > 2E \!{Sx) I A < t] > E \!{Sx U (C \ r>A(x))) | A < t] . 

Now we can combine this with ([T6l) and ([T7|) : 
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+ /(5AU(c\r>A(x))) 



A <t 



> {l-t)fiC) + t 

> (1 - t)/(C) + tf{C) = f{C) = OPT. 

where the last two inequalities follow from submodularity. Together with (1X5]) . this finishes the proof. D 
Proof: [Lemma 14. 5| By Lemma ID. II and ID.3| we get 



l-t 



1-t, 



, ($(t + (5) - $(f)) = ^-— (F(x(i + S)) - F(x{t))) > OPT - 2F(x) - 2/3i - n^^sup 





d^F 



dxidxj 
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We have | gf.^. | < 2max|/(S')| < 2nOPT, which huphes the lemma. D 

Now by taking S ~> 0, the statement of Lemma H75] leads naturally to the following differential equation: 

(1 - t)^'it) > OPT - 2$(i) - 2i/3. 

This differential equation is very similar to the one we obtained in Section |31 Let us assume that OPT = 1. 
Starting from an initial point F{to) = vq, the solution turns out to be 



*W>^+/^-2/^^-^^Q+/^-2/?^o 



-vo 



We start from initial conditions corresponding to the 0.309-approximation of [33]. It is proved in [33] that 
a fractional local optimum at to = {I — io)^ = ^(3 — \/5) has value vq > ^(1 — io) — 0.309. Therefore, we 
obtain the following solution for i > 5(8 — \/5) : 



*(<) > - + /? - 2/3t - (1 - i)^ - - 2/3 



2 ' ' V2 3-\/5, 

We solve for /3 such that the maximum of the right-hand side equals /3. The solution is 



/3 = i h + V5)(-5 + %/5 + ■\J-2 + 6%/5 

Then, for some value of t (which turns out to be roughly 0.53), we have $(t) > (3. It can be verified that 
P > 0.325; this proves Theorem 11^ 

E Additional Hardness Results 

E.l Matroid base constraints 

It is shown in 3.4, that it is hard to approximate submodular maximization subject to a matroid base 
constraint with fractional base packing number ly — £/{£—!), £ S Z, better than l/£. We showed in 
Theorem 15.31 that for £ = 2, the threshold of 1/2 can be improved to 1 — e^^/^. More generally, we show the 
following. 

Theorem E.l. There exist instances of the problem max{/(S') : S G B}, such that a (1 — e~^' + e) 
approximation for any e > would require exponentially many value queries. Here f{S) is a nonnegative 
submodular function, and B is a collection of bases in a matroid with fractional base packing number v = 

£/{£-l),£(zZ. 

Proof: Let v = j^j. Consider the hypergraph H in Figure[2l with £ instead of 2 hyperedges. Similarly let A 
(B) be the set of head (tail) vertices respectively, and let the feasible sets be those that contain £—1 vertices 
of A and one vertex of B. (i.e. B = {S : \S D A\ — £ ~ I Sz \S D B\ — 1) . The optimum can simply select the 
heads of the first £ — 1 hyperedges and one of the tails of the last one, thus the value of OPT = 1 remains 
unchanged. On the other hand, OPT will decrease since the number of symmetric elements has increased 
and there is a greater chance to miss a hyperedge. Similar to the proof of Lemma [5.41 and Theorem 15.31 we 
obtain a unique symmetrized vector x= {j,j,...,j,^,-^,...,^). Therefore, 



7 = OPT = F(x) = £ 




1-e-i/ 



for sufficiently large k. Also it is easy to see that the feasible sets of the refined instances, which are 
indeed the bases of a matroid, are those that contain a. {£ — l)/£ fraction of the vertices in A and l/k£ frac- 
tion of vertices in B. Therefore the fractional base packing number of the refined instances is equal to j^. D 
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Figure 4: Example for maximizing a submodular function subject to a matroid independence constraint; the 
hypergraph contains two directed hyperedges of weight a and the edge (a, b) of weight 1 — a; the constraint 
is that we pick at most one vertex from each of A and B. 



E.2 Matroid independence constraint 

In this subsection we focus on the problem of maximizing a submodular function subject to a matroid 
independence constraint. Similar to Section [SJ we construct our hard instances using directed hypergraphs. 

Theorem E.2. There exist instances of the problem max{/(S') : S £ 1} where f is nonnegative submodular 
and T are independent sets in a matroid such that a 0A78-approximation would require exponentially many 
value queries. 



It is worth noting that the example we considered in Theorem 15.31 does not imply any hardness factor 
better than 1/2 for the matroid independence problem. The reason is that for the vector x = (0,0,^,..., ^), 
which is contained in the matroid polytope P(A^), the value of the multilinear relaxation is 1/2. In other 
words, it is better for an algorithm not to select any vertex in the heads set A, and try to select as much as 
possible from B. 

Instance 2. To resolve this issue, we perturb the instance by adding an undirected edge (a, b) of weight 1 — a 
and we decrease the weight of the hyperedges to a, where the value of a will be optimized later (see Figure|4]). 
The objective function is again the (directed) cut function, where the edge (a, b) contributes 1 — a if we pick 
exactly one of vertices a, b. Therefore the value of the optimum remains unchanged, OPT — a + {l — a) = 1. 
On the other hand the optimal symmetrized vector x should have a non-zero value for the head vertices, 
otherwise the edge (a, b) would not have any contribution to -F'(x). 

Proof; [Theorem IE. 2| Let H be the hypergraph of Figure HI and consider the problem max{/(5) : S G I}, 
where / is the cut function of H and T is the set of independent sets of the matroid A4a.b defined in 
subsection IE. 21 Observe that Lemma 15.41 can be applied to our instance as well, thus we may use equation 
([S]) to obtain the symmetrized vectors x. Moreover, the matroid polytope can be described by the following 
equations: 

+ Xb<l 



X e PiMA,B) ^ 



J2i=li^a, +Xbi) <l. 



(18) 



Since the vertices of the set B only contribute as tails of hyperedges, the value of F{5c) can only increase if 
we increase the value of x on the vertices in B. Therefore, we can assume (using equations (j6|) and (fT8)) ) 
that 



Xa Xb f^ Q 



Xbi 



^>>k — 2k- 
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Let Xa = g; we may compute the value of OPT as follows: 

+ {l-a)[2q{l-q)] 



OPT = F{x) = 2c 



i^-<i)[^-i^-ly 



where q < 1/2. By optim.izing numerically over a, we find that the smallest value of OPT is obtained when 
a ~ 0.3513. In this case we have 7 = OPT ~ 0.4773. Also, similarly to Theorem 15. 3[ the refined instances 
are in fact instances of a submodular maximization problem over independent sets of a matroid (a partition 
matroid whose ground set is partitioned into AU B and we have to take at most half of the elements of A 
and l/2fc fraction of elements in B). D 



E.3 Cardinality constraint 

Although we do not know how to improve the hardness of approximating general submodular functions 
without any additional constraint to a value smaller than 1/2, we can show that adding a simple cardinality 
constraint makes the problem harder. In particular, we show that it is hard to approximate a submodular 
function subject to a cardinality constraint within a factor of 0.491. 

Corollary E.3. There exist instances of the problem max{/(S') : \S\ < £} with f nonnegative submodular 
such that a 0A91- approximation would require exponentially many value queries. 

We remark that a related problem, max{/(S') : \S\ = k}, is at least as difficult to approximate: we can 
reduce max{/(S') : \S\ < £} to it by trying all possible values A; = 0, 1, 2, . . . , ^. 

Proof: Let i = 2, and let H be the hypergraph we considered in previous theorem and / be the cut function 
of H. Similar to the proof of Theorem IE. 21 we have OPT — 1 and we may use equation © to obtain the 
value of X. In this case the feasibility polytope will be 

fe 

X e P{\S\ <2)^Xa+Xb + Y^iXa^ + Xb,) < 2, (19) 

i=l 

however, we may assume that we have equality for the maximum value of F{5i.), otherwise we can simply 
increase the x value of the tail vertices in B and this can only increase F{5c). Let Xa = q and Xa-^ = p and 
z — kp. Using equations © and p^ we have 

2q + 2kp = 2 ^ kp = z = I - q. 



Finally, we can compute the value of OPT: 



OPT = F(x) - 2a [(1 - g) (1 - (1 - pf)] +{l-a) [2q{l - q)] 
= 2az{l-e^'-) + 2{l-a)z{l- z). 



Again by optimizing over a, the smallest value of OPT is obtained when a ~ 0.15. In this case we 
have 7 ~ 0.49098. The refined instances are instances of submodular maximization subject to a cardinality 
constraint, where the constraint is to choose at most -^^ri fraction of the all the elements in the ground set. D 
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